Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hindi Syntax Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure Martha Palmer (University of Colorado, USA) Rajesh Bhatt.

Similar presentations


Presentation on theme: "Hindi Syntax Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure Martha Palmer (University of Colorado, USA) Rajesh Bhatt."— Presentation transcript:

1 Hindi Syntax Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure Martha Palmer (University of Colorado, USA) Rajesh Bhatt (U of Massachusetts, Amherst, USA) Bhuvana Narasimhan (University of Colorado, USA) Owen Rambow (Columbia University, New York, USA) Dipti Misra Sharma (IIIT Hyderabad, India) Fei Xia (U Washington, USA)

2 Outline A Multi-Representational Treebank for Hindi/Urdu Three Representations – Dependency – Proposition Bank – Phrase Structure Constructions – Basic Clause Structure – Unaccusative – Support Verbs

3 Dependency and Phrase Structure: Types of Trees (Reminder) Dependency: all nodes are labeled with words or empty strings Phrase structure: leaf nodes are labeled with words or empty strings, internal nodes are labeled with nonterminal symbols (special alphabet)

4 Motivation 1: Two Representations Both phrase-structure treebanks and dependency treebanks are used in NLP – Collins/Charniak/Bikel parser for PS – CoNLL task on dependency parsing Problem: currently few treebanks (no?) with PS and DS which are independently motivated  Our project: build treebank for Hindi/Urdu for which PS and DS are linguistically motivated from the outset – Dependency: Paninian grammar (Panini 400 BC) – Phrase structure: variant of Minimalism (Chomsky 1995)

5 Motivation 2: Two Content Levels Everyone (?) wants syntax Recent popularity of PropBank (Palmer et al 2002): lexical predicate-argument structure; “semantics as surfacy as it gets” Recent experience: PropBank may inform some treebanking decisions  Our project: build treebank with all levels from the outset

6 The Multi-Representational Hindi/Urdu Treebanking Project Content (What) Syntax DS Syntax + PropBank DS Syntax PS Syntax + PropBank PS Representation (How) manual automatic Devise all levels of representations simultaneously!

7 Outline A Multi-Representational Treebank for Hindi/Urdu Three Representations – Dependency – Proposition Bank – Phrase Structure Constructions – Basic Clause Structure – Unaccusative – Support Verbs

8 Hindi Paninian Framework (Dipti Sharma, Hyderabad) There are 6 main karakas (karaka relations): karata(k1):Activity of the verb resides in karta. karma(k2):Result of the verb resides in karma. karana(k3):Instrument helping in achieving the activity of the verb is karana sampradaan(k4):Receiver of the action is sampradaan apaadan(k5):Point of separation from which an entity has moved away in an action is apaadan adhikaran (k7):Place (k7p) or time (k7t) where the action is located

9 Full Set of Relations

10 Sample Paninian Analysis

11 Basic Clause Structure अति फ़ नेकिता ब कोपढ़ा AtifnekitaabkopaRhaa AtifEr g bookAc c read.Pf v Atif read the book

12 Basic Clause Structure: Dependency Structure पढ़ा अतिफ़ - ने किताब - को k1 k2

13 Outline A Multi-Representational Treebank for Hindi/Urdu Three Representations – Dependency – Proposition Bank – Phrase Structure Constructions – Basic Clause Structure – Unaccusative – Support Verbs

14 PropBank: Lexical Semantic Annotation Dependency annotation on top of DS - PropBank is a dependency representation, but the arc labels are different from DS Captures diathesis alternations: – John loaded the cart with hay. – John loaded hay on the cart. hay has same relation to predicate load in all these sentences PropBank annotates verb-meaning specific verbal roles Palmer et al 2004

15 Basic Clause Structure: PropBank किताब - को पढ़ा Roleset: पढ़ना.01 अतिफ़ - ने Arg0Arg1 पढ़ना.01 Arg 0 reader Arg 1 what is read

16 Phrase Structure Inspired by Chomskyan Principles-and- Parameters approach Binary branching Small number of nonterminals Key structural assumptions: – Only two marked argument positions for verbs, all other NPs are adjuncts and can appear anywhere – Use of traces for displacement from normal position – Case assigned under c-command

17 Basic Clause Structure: Phrase Structure

18 Outline A Multi-Representational Treebank for Hindi/Urdu Three Representations – Dependency – Proposition Bank – Phrase Structure Constructions – Unaccusative – Support Verb Constructions

19 Unaccusatives दरवाज़ाखुलगया darwaazakhulgayaa dooropengo.Pfv.MSg The door opened.

20 Unaccusative: Dependency Structure खुल गया दरवाज़ा K1

21 Unaccusative: PropBank खुल गया दरवाज़ा arg1

22 Unaccusative: Phrase Structure

23 Outline A Multi-Representational Treebank for Hindi/Urdu Three Representations – Dependency – Proposition Bank – Phrase Structure Constructions – Unaccusative – Support Verb Constructions

24 Support Verb Constructions गहनेंचोरीहोगये geheneNchori i hoho gaye jewels (m)theftdodo go.Pfv.MP l The jewels got stolen

25 Support Verb Constructions: Dependency Structure हो गये गहनें चोरी k2 pof

26 Support Verb Constructions: PropBank हो.sv Arg 0 agent of true predicate Arg 1 true predicate Arg 2 patient of true predicate

27 Support Verb Constructions: Phrase Structure

28 Where we are now DS and PS guidelines nearly complete PropBank guidelines under development Automatic conversion from DS + PropBank in progress. We have initial experimental results Close co-operation in development of the three components essential


Download ppt "Hindi Syntax Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure Martha Palmer (University of Colorado, USA) Rajesh Bhatt."

Similar presentations


Ads by Google