DS-to-PS conversion Fei Xia University of Washington July 29, 2011 1.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

Justification-based TMSs (JTMS) JTMS utilizes 3 types of nodes, where each node is associated with an assertion: 1.Premises. Their justifications (provided.
Chapter 4 Syntax.
Hindi Syntax Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure Martha Palmer (University of Colorado, USA) Rajesh Bhatt.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
ICE1341 Programming Languages Spring 2005 Lecture #5 Lecture #5 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
LTAG Semantics on the Derivation Tree Presented by Maria I. Tchalakova.
Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.
The Hindi-Urdu Treebank Lecture 7: 7/29/ Multi-representational, Multi-layered treebank Traditional approach: – Syntactic treebank: PS or DS, but.
Introduction to treebanks Session 1: 7/08/
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
Dynamic Ontologies on the Web Jeff Heflin, James Hendler.
Annotation Types for UIMA Edward Loper. UIMA Unified Information Management Architecture Analytics framework –Consists of components that perform specific.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Extracting LTAGs from Treebanks Fei Xia 04/26/07.
June 7th, 2008TAG+91 Binding Theory in LTAG Lucas Champollion University of Pennsylvania
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
Extracting Structured Data from Web Page Arvind Arasu, Hector Garcia-Molina ACM SIGMOD 2003.
Conversion from DS to PS. Information in PS and DS PS (e.g., PTB) DS (some target DS) POS tagyes Function tag (e.g., -SBJ) yes Empty category and co-indexation.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
 2003 CSLI Publications Ling 566 Oct 16, 2007 How the Grammar Works.
Translation Divergence LING 580MT Fei Xia 1/10/06.
Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Thoughts on Treebanks Christopher Manning Stanford University.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
PLANNING Partial order regression planning Temporal representation 1 Deductive planning in Logic Temporal representation 2.
Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.
Dimitrios Skoutas Alkis Simitsis
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
1 Grammar Extraction and Refinement from an HPSG Corpus Kiril Simov BulTreeBank Project ( Linguistic Modeling Laboratory, Bulgarian.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Formal Specification of Intrusion Signatures and Detection Rules By Jean-Philippe Pouzol and Mireille Ducassé 15 th IEEE Computer Security Foundations.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
Albert Gatt LIN3021 Formal Semantics Lecture 4. In this lecture Compositionality in Natural Langauge revisited: The role of types The typed lambda calculus.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
CSE Winter 2008 Introduction to Program Verification January 31 proofs through simplification.
The Minimalist Program
Linguistic Theory Lecture 5 Filters. The Structure of the Grammar 1960s (Standard Theory) LexiconPhrase Structure Rules Deep Structure Transformations.
THEORY OF COMPUTATION Komate AMPHAWAN 1. 2.
Supertagging CMSC Natural Language Processing January 31, 2006.
Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
Machine Learning Concept Learning General-to Specific Ordering
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
 2003 CSLI Publications Ling 566 Oct 17, 2011 How the Grammar Works.
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 3.
Basic Probability. Introduction Our formal study of probability will base on Set theory Axiomatic approach (base for all our further studies of probability)
Describing Syntax and Semantics
Lexical analysis Finite Automata
Basic Parsing with Context Free Grammars Chapter 13
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Instructor: Nick Cercone CSEB -
4b Lexical analysis Finite Automata
4b Lexical analysis Finite Automata
Ling 566 Oct 14, 2008 How the Grammar Works.
Presentation transcript:

DS-to-PS conversion Fei Xia University of Washington July 29,

Main steps in building the treebank DS treebank: – Tokenization – Morphological analysis, voice, etc. – POS tagging – DS Propbank: adding Predicate-argument info Automatic DS-to-PS conversion Some manual check to ensure the conversion works well 2

Outline Important concepts Compatibility and consistency Handling inconsistency 3

Important concepts Linguistic phenomena Representation type Linguistic theory – Theoretical framework – Linguistic analyses Annotation guidelines 4

Linguistic phenomena They are what we want to present, including – General concepts: e.g., which words form a phrase? What types of phrases does a language have? – Types of relations between words or phrases (e.g., subjecthood, temporal modification) – Specific constructions (e.g., small clause) – Finer-grained distinctions (e.g., unergative vs. unaccusative) 5

Representation type It is the type of mathematical object that is used to represent syntactic facts Examples: DS, PS Each representation type can decide what more specific representation devices to employ – Labels on the arcs of a tree – Use of empty nodes or coindexation between nodes 6

Linguistic theory It explains how linguistic phenomena are represented in the chose representation type It has two components: – Theoretical framework: it provides vocabulary and constraints in which linguistic theories can be formulated: e.g., GB, LFG, LTAG, HPSG – Linguistic analyses 7

Small clause 8

“Exceptional case-marking” analysis 9

“Raising-to-object” analysis 10

Annotation guidelines Guideline designers need to choose the following – Linguistic phenomena to represent – Representation type – Theoretical framework – Linguistic analyses – Descriptions – Examples: sentences with DS or PS trees 11

Outline Important concepts Compatibility and consistency Handling inconsistency 12

“Exceptional case-marking” analysis 13

“Raising-to-object” analysis 14

Implicit vs. explicit information Certain aspects of information has to be expressed explicitly in DS, but not PS, or vice versa – Head in DS – Syntactic categories of phrases in PS Not explicitly providing info does not mean that corresponding concepts does not exist in DS/PS 15

Syntactic consistency We assume each phrase in a PS has a special word, head word, which represents the property of the phrase. A (DS, PS) pair is called consistent if there is a way to assign a head word to each internal node in the PS so that the resulting DS is identical to the given DS. 16

Consistent pairs 17

Inconsistent pairs 18

A real example 19

Consistency assumption 20

Definition of consistency A DS and a PS are consistent iff there exists a flattened version of the PS that is identical to the DS. If the input DS and the desired PS are consistent, the PS can be created by stretching the DS and adding syntactic labels. 21

Checking consistency For each (dep, head) pair in the DS – find their location in the PS and their closest antecedent – add heads to the nodes on the path between the leaf nodes and the antecedent The DS and the PS are consistent iff each node in the PS has exactly one head. 22

(Vinken, join) (Vinken) (join) (board, join) (board) (will, join) (29, join) (29) 23

Outline Important concepts Compatibility and consistency Handling inconsistency 24

wh-movement (who, come) come (come, think) 25

wh-movement (who, come) (come, think) come come | think come 26

wh-movement (who, come) (come, think) come ?? think ?? (you, think) 27

Can DS and PS be inconsistent? DS and PS can represent different aspects of the same overall pictures, and still be consistent. – Info provided in PropBank: e.g., empty subject, unaccusative – Info that is in PS only: e.g., traces DS and PS should not choose “conflicting” analyses. – DS and PS are two images of the same underlying treebank, not two separate treebanks. – Ex: ba-construction in Chinese: verb, prep, or something else? – Ex: free relatives: empty nominal head The inconsistency cases should be rare and well-motivated. 28

How to handle inconsistency? Detect inconsistency in (DS, PS) pairs in the guidelines Consult guideline designers to determine whether the inconsistency can be resolved by changing analyses If not, introduce DS cons and ensure sufficient info is in DS for automatic conversion. 29

Two-stage conversion DS to DS cons : by removing “inconsistency” between DS and PS. DS cons to PS: by applying conversion rules 30

Case #1: long-distance movement 31 DS const: DS prop: Other examples: extraposition Easily detectable due to non-projectivity Create DS const by moving up the “moved element” and leaving a trace which node is the “moved element”? The one that is apart from other nodes in the subtree.

Case #2: local scrambling 32 Detectable by assuming canonical word order: k1 > k2 Need from PS/DS teams the canonical word order and what word order triggers movement

Case #3: small clause rule 33 Detectable by dependency type k2s Need confirmation from IIIT that k2s is used only for small clause

Case 4: support verb 34 Detectable by dependency type “pof” Need confirmation from IIIT that “pof” is used only for support verb

Conclusion We define consistency between DS and PS DS and PS can be inconsistent but such cases should be rare and well-motivated. We will handle inconsistency with the two- stage approach 35

Conversion algorithm 36

Definition of conversion rule A conversion rule is a (DS_pattern, PS_pattern) pair. Ex: Simplest case: – DS_pattern corresponds to only one dependency link – Decomposing DS becomes trivial – PS_pattern is a tree fragment (e.g., wh-movement) – Learning rules from (PS, DS) pairs is easy 37

Extracting rules 38

Rules extracted from the example 39

Input DS 40

41

Gluing PS segments together 42

c c c 43

44